-
Notifications
You must be signed in to change notification settings - Fork 676
feat: Add base64 and HTTP image URL support to vLLM workers #4114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Krishnan Prashanth <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
WalkthroughThis pull request adds multimodal image data support to the vLLM handler system. It introduces image loading and data extraction logic in the core handlers, updates deployment scripts to reflect a simplified or alternative EPD architecture for multimodal inference, and extends test configurations with image URLs and base64-encoded test data. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
lib/bindings/python/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (4)
components/src/dynamo/vllm/handlers.py(5 hunks)examples/backends/vllm/launch/agg_multimodal.sh(3 hunks)examples/backends/vllm/launch/agg_multimodal_epd.sh(1 hunks)tests/serve/test_vllm.py(5 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-16T19:47:30.312Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.
Applied to files:
components/src/dynamo/vllm/handlers.py
📚 Learning: 2025-10-28T04:09:48.264Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.
Applied to files:
components/src/dynamo/vllm/handlers.py
🧬 Code graph analysis (1)
tests/serve/test_vllm.py (1)
tests/utils/payload_builder.py (1)
chat_payload(81-108)
🪛 Ruff (0.14.3)
components/src/dynamo/vllm/handlers.py
149-149: Use logging.exception instead of logging.error
Replace with exception
(TRY400)
tests/serve/test_vllm.py
31-31: Probable use of requests call without timeout
(S113)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: trtllm (amd64)
- GitHub Check: operator (amd64)
- GitHub Check: Build and Test - dynamo
milesial
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Krishnan Prashanth <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
indrajit96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work with E2E PR!
Left a few questions and minor comments.
LGTM!
krishung5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
Signed-off-by: Krishnan Prashanth <[email protected]>
Signed-off-by: Krishnan Prashanth <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove test_multimodal.sh from the root of the repo. It shouldn't be at top level. It should either be removed or put in some test utils type of folder.
Added for ease of cluster testing. Will clean up before merging. |
Signed-off-by: Krishnan Prashanth <[email protected]>
…o#4114) Signed-off-by: Krishnan Prashanth <[email protected]>
| fi | ||
|
|
||
| # Start processor (Python-based preprocessing, handles prompt templating) | ||
| python -m dynamo.vllm --multimodal-processor --model $MODEL_NAME --mm-prompt-template "$PROMPT_TEMPLATE" & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question for next week - if the worker/backend supports PreprocessedRequest.multi_modal_data now, do we need this multimodal-preprocessor that explicitly registers as expecting ModelInput.Text so it can do the processing itself?
ref:
dynamo/components/src/dynamo/vllm/main.py
Line 555 in 51c4fe6
| ModelInput.Text, # Custom processor is used and this type bypasses SDK processor |
Overview
Enables vLLM backend workers to process base64 data URL images.
Backend workers now extract image URLs from
PreprocessedRequest.multi_modal_data(added in #3733) and decode them usingImageLoader:data:image/*;base64,<encoded>→ Decoded toPIL.Imagehttp://→ Fetched and loaded asPIL.ImageRelated PRS:
_extract_multimodal_data()with a NIXL read.Details
Modified:
components/src/dynamo/vllm/handlers.py_extract_multimodal_data()methodScripts:
agg_multimodal.sh- Standard deployment using Rust preprocessoragg_multimodal.sh→agg_multimodal_epd.sh- EPD architecture (preserved)Tests:
_epdsuffix (e.g.,multimodal_agg_qwen→multimodal_agg_qwen_epd)multimodal_agg_qwentest using standard deploymentWhere should the reviewer start?
handlers.py:116-168- Extraction logictest_vllm.py:166-195- Test validationSummary by CodeRabbit
New Features
Tests